One way to perform and report an analysis is to do the analysis in R, then copy and paste your results and plots from R into Word, or similar.
But this isn’t a very good way of doing things because:
The idea of R Markdown is that we write both text and R code into our input (.Rmd) file. We then ‘knit’ the file to produce the output document.
The R code is written into ‘code chunks’ like this
```{r, eval=TRUE}
sample.mean <- mean(rnorm(100))
```
`r sum(1:100)`”, which then puts the answer in the right place in the text. To put in an in-line code expression type `r, followed by the code, followed by ` to end it.Save the files markdownExample.Rmd and Cars.csv in the same folder.
The .Rmd file is the input file that will be knitted (meaning “rendered”) to produce the output. Read through and try to guess what each part does.
Then click on the button that says Knit.
Change the title, put yourself as the author, and put in today’s date.
Add a new section, just before the Conclusion section, called ‘Bits I am adding’. In this section, add a new figure containing a plot of cars.data$'MPG (city)' versus cars.data$'MPG (highway)'.Give the figure a sensible caption, and write a sentence in the text referring to the figure.
The command table(cars.data$Type) summarises the different types of vehicle in the data set. The command barplot(table(cars.data$Type)) shows the summary as a bar plot. Add this bar plot to the report.
Add a sentence saying “The cheapest car in the data set (in thousands of dollars) costs” then use an in-line code expression with the command min(cars.data$'Minimum price') to finish off the sentence automatically.
The “source” markdown files are plain text. This makes them well suited to version control, to keep track of edits.
Git is very well suited to version control (and quite universally used).
Git can store the data too.
So everything is in sync, can be “rolled back” to, is easily shared, and is backed up.
ggplot2Base R plotting is very powerful/versatile.
But making more complex plots often requires long (and hard-to-read) code.
ggplot2 is a tidyverse package designed to work nicely with “tidy” data. Loaded withggplot has a steep learning curve at first: it takes some getting to grips even with making “easy” plots!
But on the other hand, later on it becomes quick to make complex ones.
Here are some examples
MASS::hills |> ggplot(aes(x = climb, y = time))
MASS::hills |> ggplot(aes(x = climb, y = time)) + geom_point()
MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills))) + geom_point() + geom_text()
library(ggrepel)
MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills))) + geom_point() + geom_text_repel()
MASS::hills |> ggplot(aes(x = climb, y = time, label = rownames(MASS::hills), colour = grepl('Ben',rownames(MASS::hills)))) + geom_point() + geom_text_repel()Simon Preston & Richard Wilkinson 31/1/2023